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Abstract 

Integrated powers of densities of one- or two-multidimensional random variables 
appear in a variety of problems in mathematical statistics, information theory, 
and computer science. We study [/-statistic estimators for a class of such in- 
tegral functionals based on the e-close vector observations in the correspond- 
ing independent and identically distributed samples. We show some asymptotic 
properties of these estimators (e.g., consistency and asymptotic normality). The 
results can be used in a variety of problems in mathematical statistics and com- 
puter science (e.g., distribution identification problems, approximate matching 
for random databases, two-sample problems). 

Keywords: [/-statistics, estimation of divergence, density power divergence, 
asymptotic normality, entropy estimation, Renyi entropy 



1. Introduction 

Let the distributions Vx and Vy of the d-dimensional random variables X 
and Y have densities px(x) and py(x),x € R d , respectively. Various character- 
istics in mathematical statistics, information theory, and computer science, say 
entropy-type integral functionals, are expressed in terms of integrated powers of 
Px (x) and py (x) . For example, a widely accepted measure of closeness between 
Vx and Vy is the (quadratic) density power divergence (Basu et al., 1998) 



D 2 = D 2 (V X ,V Y ) := / (p x (x) -p Y {x)Ydx. 

jR d 

Other examples include the Renyi entropy for quantifying uncertainty in Vx 
(Renyi, 1970) 

h s = h s (V x ) ■= lo S (J Rd Px W dx ) ' s ^ 1 ' 
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and the differential variability for some database problems (Seleznjev and Thal- 
heim, 2010) 



Henceforth we use log a; to denote the natural logarithm of x. For non-negative 
integers k\,k 2 > 0,k := (k\,k 2 ), we consider the Renyi entropy functionals 
(Kallberg et al., 2012) 



Moreover, given a set of constants a := {ao,a\,a 2 }, we introduce the related 
quadratic functionals 



Note that the quadratic divergence D 2 = q 2 .o — ^Qi.i + Qo,2, the Renyi entropy 
hk = log(gfc i o)/(l — k),k = 2, 3, . . ., and the variability v = — log(q , i i i). Some 
applications of Renyi entropy and divergence measures can be found, e.g., in 
information theoretic learning (Principe, 2010). More applications of entropy 
and divergence in statistics (e.g., classification, distribution identification prob- 
lems, and statistical inference), computer science (e.g., average case analysis for 
random databases, pattern recognition, and image matching), and econometrics 
are discussed, e.g., in Kapur (1989), Kapur and Kesavan (1992), Pardo (2006), 
Leonenko et al. (2008), Escolano et al. (2009), Seleznjev and Thalhcim (2003, 
2010), Thalheim (2000), Leonenko and Seleznjev (2010), Neemuchwala et al. 
(2005), and Ullah (1996). The divergence D 2 belongs to a subclass of the Breg- 
man divergences that find various applications in statistics (see, e.g., Basseville, 
2010, and references therein). 

In this paper, to demonstrate the general approach, we study non-parametric 
estimation of some entropy-type integral functionals, e.g., q^ and 92(a), using in- 
dependent samples from Vx and Vy . Some new asymptotic results are presented 
for a class of {/-statistic estimators for these functionals. These estimators are 
based on the e-close observations in the corresponding samples. We general- 
ize some results and techniques proposed in Leonenko and Seleznjev (2010) and 
Kallberg et al. (2012). In particular, we obtain consistency of the corresponding 
estimators for a more wide class of distributions and prove asymptotic normality 
of the estimators for the quadratic functionals q 2 (a) . 

Leonenko et al. (2008) study asymptotic properties of nearest-neighbor esti- 
mators for gk, and obtain consistency when the densities are bounded. Gine and 
Nickl (2008) show asymptotical normality for a kernel estimator of q 2y o in the 
one-dimensional case. Ahmad and Cerrito (1993) and Li (1996, 1999) use kernel 
estimates of the quadratic divergence D 2 as test statistics for the two-sample 
problem, and obtain asymptotically normal null distribution. For a certain ker- 
nel estimator, we prove asymptotic normality under different conditions. The 





q 2 = q 2 (a) := a <? 2 ,o + aitfi.i + a 2 q , 2 . 
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number of small interpoint distances in a random sample is among the most 
studied examples of [/-statistics with kernels varying with the sample size (see, 
e.g., Weber, 1983, Jammalamadaka and Janson, 1986, Penrose, 1995). A signifi- 
cant feature of such characteristics is that one can obtain normal limit laws even 
in some degenerate cases. We generalize this approach for two-sample statis- 
tics. This extension enables some statistical applications where the degeneracy 
condition might be crucial, e.g., estimation of divergence. 

First we introduce some notation. Throughout the paper, we assume that 
the random vectors X and Y are independent. Let d{x, y) :— \x — y\ denote the 
Euclidean distance in R d and define B e (x) :— {y : d(x,y) < e} to be an open 
e-ball in R d with center at x and radius e. Denote by b e (d) :— e d bi(d),bi(d) = 
27r d / 2 /{dT{d/2)), the volume of the e-ball. Define the e-ball probability as 

PxA*) ■■= p {x e B e (x)}. 

We say that the vectors x and y are e-close, if d(x, y) < e, for some e > 0. Let 
X\, . . . , X ni and Yi, . . . , Y n2 be mutually independent samples of independent 
and identically distributed (i.i.d.) observations from Vx and Vy, respectively. 
Define n := (77,1,712), n := n\ + ri2, and say that n — > 00 if Ui,n2 — > 00. Let 
e = e(n) — > as n — > 00. 

In what follows, we consider estimation problems for both one and two sam- 
ples. However, in statements of results and the proofs it is assumed, for sake 
of space and clarity, that two samples are available, i.e., n\^ni > 0. This can 
be done without loss of generality, because in the one-sample case, e.g., estima- 
tion of Qkifli k\ > 2, from X±, . . . , X ni , an auxiliary sample Yi, . . . , Y n2 can be 
considered. 

DP 

Denote by — > and — > convergence in distribution and probability, respec- 
tively. For a sequence of random variables U n ,n > 1, we write U n — Op(l) 
as n — > 00 if for any S > and large enough n > 1, there exists C > such 
that P(|Z7 n | > C) < S. Moreover, for a numerical sequence w n ,n > 1, let 
U n = Op(w n ) as n — > 00 if U n /w n = Op(l) as n — > 00. 

The remaining part of the paper is organized as follows. In Section [21 we 
consider estimation of the Renyi entropy functional q^. The asymptotic results 
for estimation of the quadratic functional (72 (a) are given in Section [3] In Sec- 
tion SI we discuss applications of the obtained results to estimation of density 
power divergence, the two-sample problem, and statistical inference for some 
entropy-type characteristics. Several numerical examples illustrate the rate of 
convergence of the asymptotic results. Section [5] contains the proofs of the 
statements from the previous sections. 

2. Estimation of the Renyi entropy functional 

We introduce the [/-statistic estimators of proposed by Kallberg et al. 
(2012). If r is a non-negative integer, define S m , r to be the set of all r-subsets 
of {1, ... , m}. Let S € <5 ni ,fci and T € S„ 2t k 2 - When k\ > 1, we define 

^kL^T) = I(d{Xi,Xj) < e,d(Xi,Yi) < e,Vj e S,VZ e T), i e 5, 
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i.e., the indicator of the event that the observations {Xj,j € S} and {Yj, I € T} 
are e-close to Xj. By conditioning, we have 

^ ■= V(^ n (S;T)) = E(p x , c (X) k ^p Y ^X) k *). 

In a similar way, when k\ = and fc 2 > 1, we define 

V'gUCn = < e, Vj e T), !£ T, 

and 

<Zk,e :=E(v£ ) n (T)) = E(p y , e (F) fe - 1 ). 
Now, a {/-statistic for gk, e (see, e.g., Ch. 2, Lee, 1990) is given by 

Q k ,n = g k ,„ >e :=(? 1 )" 1 (? 2 )" 1 £ E n 

with the kernel ^k iD (5;T) defined by the symmetrization 



,. E€.n,(r), if fel=0,fe>l. 

k2 4G T 



By definition, Qk,n is an unbiased estimator of q^ f _. Let fc := k\ + k 2l k > 2, 
and define the estimator of as 



Kallbcrg et al. (2012) obtain consistency of Qk,n when the densities are bounded 
and continuous. The following theorem yields weaker density conditions for 
consistency. 

Theorem 1. Ifpx,PY € L 2 k-i(R d ), ne d ( 1_1 / fe ) -> oo, andn 1 /n -> p, < p < 1, 
i/ien 

^k^n -> <?k as n ^ oo. 

3. Estimation of the quadratic functional q>2(a) 

The following linear combination is a sensible estimator for the quadratic 
functional q 2 

Q2,n = Q2,n(a) := a Q 2 
For < p < 1, introduce the characteristics 

p 



« = K„(a) := ^Var (ooPxpO + yM*)) + Y^ Var (w(F) + yPxQO). 
. . 2 / On a, a? 

77 = ^ (a) : = Mdj \7 q2 ° + + mw) 91,1 
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Let 

?2, e := E(Q 2: n) = a <72,0 : e + ai&,i,e + a 2 <70,2,£, 

where gk, e := E(Qk,n) = &e(tO -1 <Zk,e- We get the following theorem for the 
asymptotic normality of Q2,n- 

Theorem 2. Let px,PY € L 3 (R d ), and m/n = p, < p < 1. 

(%) J/ ne d -> /8, < /3 < oo, then 

Vn{Q2, n - <72,c) iV(0, k + rj/fi) as n oo. 

(Mj //ne d -> and n 2 e d -> oo, fften 

ne d/2 (Q 2: n - <2 2 ,e) ^ #(0, ??) as n -> oo. 

From a practical point of view, the unknown asymptotic variances in Theorem[2] 
have to be estimated, i.e., we need consistent estimators for k and 77. By ex- 
panding the terms in k, we see that it is a function of p and the functionals 
{<li,j '■ 2 < i + j < 3}, i.e., k = n(p,{qij : 2 < i + j < 3}). Since Theorem Q] 
yields that {Qi.j : n : 2 < i+j < 3} are consistent estimators of these functionals, 
we set up the plug-in estimator 

n n := n(p n , {Qi,j, n : 2 < % + j < 3}) 

for k, where p n :— n\jn. Similarly, denote by rj^ the corresponding estimator 
for 77 and define v n := k„ + r/ n /j3 n , /?„ := ne d , to be an estimator of k + rj//3. It 
follows from the Slutsky theorem that « n and z^ n are consistent estimators of 77 
and k + rj//3, respectively. 

To ensure a sufficient rate of decay for the bias term — 92, we propose 
smoothness conditions for the densities. Denote by (K), < a < 1, K > 0, 
a linear space of functions in L^(R d ) that satisfy a a- Holder condition in L 2 - 
norm with constant K, i.e., if p £ H^\k) and h G B\{d), then 

lb(- + fe)-p(-)|| 2 < (1) 
Note that ([T]) holds if, e.g., for some function g e L 2 (R d ), 

\p(x + h)-p(x)\ <g(x)\h\ a . 

The density smoothness can be introduced in different ways, e.g., by the point- 
wise Holder conditions (Kallberg et al., 2012) or the Fourier characterization 
(Gine and Nickl, 2008). 

A bound for the bias and the rate of convergence in probability are presented 
in the following theorem. Additionally, we obtain asymptotically pivotal quan- 
tities which can be used, e.g., to construct asymptotic confidence intervals for 
the functional q 2 . Let L(n),n > 1, be a slowly varying function as n — > 00. 
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Theorem 3. Let px,PY G (K) and n\jn = p, < p < 1. 
(%) Then for the bias, we have 

\q 2 ^-q 2 \<Ce 2a ,C>0. 

(li) If0<a<d/4 and e ~ cn,- 1 ^ 2a+d / 2 \ c> 0, then 

Q 2 ,n - © = Op(n- 2 «/( 2a+d / 2 )) flS n^oo. 
(raj If a > d/4 and ne d — > (3,0 < f3 < oo, then 

Vn~(Q2,n - 92) 4 JV(0, k + T7//3) and _^(Q 2 - ft ) 4 2V(0, 1) 



as n — > 00. 

(ivj If e ~ L(n)n~ 2 / d and L(n) —¥ 00, i.e., n 2 e d — ¥ 00, then 

ne d > 2 {Q 2 .n- q 2 )^N(0,n) and !^!(Q 2 - gj,) 4 JV(0, 1) 



as n — > 00. 

Remark 1. It is worth noting that in this paper, we do not require k > (cf. 
Kallberg et al, 2012). The condition k > corresponds to the non-degeneracy 
condition commonly used for proving asymptotic normality of [/-statistics by 
the conventional techniques, e.g., using the //-decomposition (see, e.g., Lee, 
1990, Koroljuk and Borovskich, 1994). For example, when considering, e.g., the 
divergence D 2 , the condition k > implies that px{x) 7^ Py{x) on a set of 
positive measure. This assumption may be too restrictive in some statistical 
applications whenever the distributions of X and Y are too close. 

Remark 2. In Theorem [S^wJ, we get asymptotic normality for an arbitrary 
dimension. This is an improvement of some results in Leonenko and Seleznjev 
(2010) and Kallberg et al. (2012). Note, however, that the rate of convergence 
ne d / 2 ~ L{n) d l 2 can be slower than y/n in this case. 

Remark 3. The condition ni/n = p,0 < p < 1, in Theorems [2] and [3] is 
technical and we claim that it can be replaced with the slightly weaker condition 
n 1 /n — > p, < p < 1. 

Remark 4. In the one-sample case, the results in Theorems [H and [3] are es- 
sentially independent of p. In fact, consider, e.g., the estimator Q 2 fl.n of q 2 $, 
i.e., a = {1,0,0}, k = 4M&r(p x (X))/p, and r\ = 2b 1 (d)- 1 q 2i0 / p 2 '. We have 
n = Tlx/ p, so if n\e d — > A, < A < 00, then ne d —> X/p —: /?. Hence, it follows 
from Theorem \3j(iii) that 

\/^i(Q2,o,n - 92, 0) ^ N fo,4Var(pxpO) + ^Z^.o/aJ as m 00. 

Therefore, we obtain a result with y / n]~-scaling that does not depend on p as 
desired. A similar modification can be done for the ne d / 2 -scaling. 
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4. Applications 

Estimation of density power divergence 

The introduced quadratic divergence D2 belongs to the wide class of density 
power divergences (Basu et al., 1998), defined by 

D s =D s (Vx,V Y ):= I (^—px(xy-^-px{x)pY(xy- 1 +p Y {xy)dx 
J Rd \s-l s - 1 J 

1 s 

= 7<7s,0 7?1. s-1 + QO.s, S > 1. 

s — 1 s — 1 

When s = r is a non-negative integer and px,PY G L2 r -i(R d ), then Theorem[T] 
implies that 

1 r - 

Dr.n • rQr.O.n tQi.i — l,n "T" Qo.r.n 

r — 1 r — 1 

is a consistent estimator of £> r . Moreover, Theorem[3]gives conditions for asymp- 
totic normality of the quadratic estimator Z?2 jn - 

The quadratic divergence D2 can be used as a dissimilarity measure to in- 
vestigate pairwise differences among M populations or objects. Let the features 
of population I be represented by the random feature vector Vi with density 
Pv,(x),x G R d ,l — l,...,M. Using independent samples from populations 
Vi, I = 1, . . . , M, we apply, e.g., the Bonferroni method and calculate the (^) 
approximate simultaneous confidence intervals {Ii m } for the quadratic diver- 
gences {-D 2 ,z, m }, D2 : i, m '■= Dz('Pv l >'Pv m ), for a given confidence level. Now, the 
intervals {Ii m } can be used to determine which populations are different with 
respect to their feature densities. 

Example 1 . We consider estimation of the quadratic density power divergence 
D2{Vx,'Py) between two three-dimensional distributions. The distribution of 
the components of X and Y are i(3)-i.i.d. and N(l, l)-i.i.d., respectively. In 
this case it holds that D2 ~ 0.018. We simulate N S i m = 500 independent 
and normalized residuals := y/ri(D2,n — D2)/ y/T^i, i = 1, . . . ,N s i m , with 
m = n2 — 500, and e = 1/4. Figure [1] illustrates the performance of the 
normal approximation of J2n indicated by Theorem [3] The histogram, normal 
quantilc plot, and p- value (0.41) for the Kolmogorov-Smirnov test also support 
the assumption of standard normality for the residuals. 

4-2. The two-sample problem 

A general null hypothesis of closeness between Vx and Vy is given by 

H : p x {x) =p Y {x) a.e. 

We consider the problem of testing Hq against the alternative Hi that px(x) 
and py {%) differ on a set of positive measure (often referred to as the two-sample 
problem). Note that the alternative can be written as Hi : D 2 > 0. Hence, we 
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Histogram 



Normal QQ-plot 




Figure 1: Three-dimensional distributions with r.(3)-i.i.d. and N(l, l)-i.i.d. components, re- 
spectively; sample sizes ni = 712 = 500 and e = 1/4. Standard normal approximation for the 
normalized residuals; N s i m = 500. 



define a test statistic based on the estimator Z) 2 ,n for D 2 (see, e.g., Li, 1996), 
according to 

_ ne d/2 a 
T n '.— — ^=£*2.n- 

The asymptotics for the distribution of T n are presented in the following propo- 
sition. Let {it n } be a numerical sequence such that u n = o(ne d / 2 ) as n — > 00. 

Proposition 4. Assume that px,PY £ Ls(R ), n 2 e d — > 00, and ni/ri = p. 

(i) Under Ho, we have 

ne d/2 D 2 , n 4 N(0, 77) and T n 4 2V(0, 1) as n -> 00. 

^j^) Under Hi, we have 

P(T n > u n ) 1 as n — >• 00. 

Thus, we reject i?o if > A Q , where A Q is the a— quantile of the standard 
normal distribution. It follows that this test has asymptotic significance level a 
and is consistent against all alternatives that satisfy px,PY € L^{R d ). 

Remark 5. Since 02,0 = Qi,i — Qo,2 under Hq, the asymptotic variance in 
Proposition H^jj is reduced to 

h„ 2g 2 ,o 
V = Va ■= 
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Therefore, the test might be more accurate if T n is redefined by means of re- 
placing i] n with an estimate of 770 based on the pooled sample {Zi, . . . , Z n } := 
{X 1: ...,X ni ,Y 1 ,...,Y n2 } (d.U,1999). 

4-3. Estimation of Renyi entropy and differential variability 
Consider the class of functionals 

h k = h k (V x ,VY) ■= Y^log(9k), k>2. 

When Vx = fy, we get the Renyi entropy hk,o, a family of functions for measur- 
ing uncertainty or randomness of a system (Renyi, 1970). Another important 
example is the differential variability v — hi t i, a characteristic for modeling 
some random databases (Seleznjev and Thalheim, 2010). When the densities 
are bounded and continuous, the results in Kallberg et al. (2012) imply consis- 
tency of the truncated plug-in estimator 

#k,n := r - 7 log(max(Q k;n , l/n)) 

for h k - It follows from Theorem [T] and the Slutsky theorem that H kn is consis- 
tent under the weaker condition px,Py <= L 2 k-i{R d )- 

In the quadratic cases k = 2, i.e., k = (2, 0), (1, 1), the asymptotic normality 
properties of i?k,n are studied by Leonenko and Seleznjev (2010) and Kallberg 
et al. (2012). The following proposition generalizes some of these results (see 
also Remarks 1-4). 

Proposition 5. Assume that k = 2. Let PXiPy € H% (K) and n\jn = p, 
0< p < 1. 

(i) If0<a< d/4 and e ~ cnT l K 2a+d M , c> 0, then 

ffk,„ - ftk = P (n- 2a ^ 2a+d ^) as n ^ 00. 

(ii) If a > d/4 and ne d -> /3, < f3 < 00, then 

Vn(H k<a -h k )^ N(0,K + v/l3) and %^(ff k ,n - h k ) 4 N(0, 1) 
as n — > 00 . 

(Hi) If e ^ L(n)n~ 2 1 d and L(n) — > 00, i.e., n 2 e rf — > oo, then 

ne d / 2 (H Kn -h k )^N(0, V ) and ne d / 2 %^(ff k , n - h k ) 4 AT(0, 1) 
as n — > 00 . 
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The estimator i/k,n of /ik can be used, e.g., for distribution identification prob- 
lems and approximate matching in stochastic databases (for a description, see 
Kallberg et al., 2012). 

Example 2. Let X and Y be one-dimensional uniform random variables, 
i.e., X ~ [7(0,1) and Y ~ [7(0, -\/2), and consider estimation of the differ- 
ential variability v = h\,\ = log(2)/2. We simulate independent and nor- 
malized residuals Rn '■= \/^Qi,i,n(-ffi,i,n — ^i,i)/-\A^i) i = l,---,^sim, with 
ni = n 2 = 300, e = 1/100, and 7V sim = 600. Figure [5] illustrates the normal ap- 
proximation for these residuals indicated by Proposition The histogram, 
quantile plot, and p- value (0.36) for the Kolmogorov-Smirnov test allow to ac- 
cept the hypothesis of standard normality. 



Histogram Normal QQ-plot 




-4 -2 2 4 -2 2 4 

Residuals Normal quantiles 

Figure 2: Uniform distributions, U(0, 1) and U(0, y/2); sample sizes m = nu = 300 and 
e = 1/100. Standard normal approximation for the normalized residuals; N s i m = 600. 

5. Proofs 

The following lemma is used in the subsequent proofs. 
Lemma 1. For a, b > 0, assume that px ,Py € L a+ i, + i(R d ). Then 

b e (d)-^E( Px ^(X) a PY ^X) b ) -> q a+hb ase ^O. 
Proof. Let px,e(%) ■— b e (d) p x ,e(x),pY,e(x) '■= b t (d)~ 1 pY.t(x). Consider the 
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decomposition 

b e (d)- {a+b) V(pxAX) a PYAX) b ) = I PX,e(x) a pYAx) b px(x)dx 

JR d 

Px(x) a+1 p Y (x) b dx+ [ (p Y Ax) b -PY(x) b )px(x) a+1 dx (2) 

R d J R d 

+ / (pxAx) a -px{x) a )p Y .e{x) b Px{x)dx 

J R d 

Hence, the assertion follows if the last two terms in ([2]) tend to as e — > 0. By 
the extension of Holder's inequality (see, e.g., Ch. 2, Bogachev, 2007), 

(PxAx) a -Px(x) a )p Y Ax) b Px(x)dx (3) 

R d 

< \\PxAT -Px(T\\(a+b+l)/a\\PYA-) b \\(a+b+l)/b\\Px(-)\\a+b+l 

The Lebesgue differentiation theorem implies 

pxA x T +b+1 -> Px(x) a+b+1 as e -> a.e. (4) 

If V = (Vi, . . . , Vd)' is an auxiliary random vector uniformly distributed in the 
unit ball B\(d), then pxA x ) = ^(Px(x — eV)), and by Jensen's inequality, 

{pxAx) a ) [a+b+1),a < 9e(x) := E(p x (a: - e^) a+b+1 ) (5) 

Since G L a+ i )+ i(R d ) 1 it follows from the Lebesgue differentiation theorem 
that 

g e (x) -> 3 (z) := pxW a+H1 as e -> a.e. (6) 
Furthermore, Fubini's theorem yields 

g t (x)dx = / g(x)dx. (7) 

We get from (@|-([7]) and a generalization of the dominated convergence theorem 
(see, e.g., Ch. 2, Bogachev, 2007) that 

\\PxA-) a \\(a+b+l)/a -> |bx(-) a H(a+6+l)/a &S £ -> 0. (8) 

Similarly, 

\\PYA-) b \\(a+b+l)/b -> lbr(-) b H(a+6+l)/b &S £ -> 0. (9) 

Now we use the following result (see, e.g., Ch. 1, Kallenberg, 1997): For a 
sequence of functions /„ £ L p (R d ),p > l,n = 1,..., with f n (x) —> f(x) a.e., 
/ G L p (R d ), it holds that 

\\fn\\ P ll/llp iff ||/„ - /||p -> as n ->■ oo. (10) 
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Note that Q, ©, and (JTOJ imply 

\\pxAT - Px(r\\(a+ b +i)/a -> as e -J- 0. (11) 
Finally, it follows from ©, ©, and (JTTJ that 

(px, e (a;) a -_px(a;) Q )py, e (a;) b Px(a;)^ -> as e -> 0. 



In a similar way, we obtain 

(p Y A x ) b ~PY(x) b )p x {x) a+1 dx -> as e 0. 
This completes the proof. □ 



Proof of Theorem^ For I — 0, . . . , fci, and m = 0, . . . , /c2, let 
tAk,i,m,n(^l) • • • -, x l'i 2/1) • • • ) Dm) 

■ = E(ipk,n.(xi, . . . ,xi,Xi + i, . . . ,X kl ;yi, . . . ,y m , Y m+1 , . . . , Y k2 )) 

and 

°k,J,m,n : = Var(V>k,i,m,n(Xi, . . . , Xf, Yl, . . . , Y m )), 

where we define er 2 . n = 0. From the conventional theory of [/-statistics (see, 
e.g., Ch. 2, Lee, 1990), we have 

Var(Q k , n ) = b e (d)-^-V £ [l>[m> ^jl J >- mJ < ltm , n . (12) 

i=0 ro=Q \fel/ Vfe 2 / 

First, assume that fci > 1. Following the argument in Kallberg et al. (2012), it 
is straightforward to show that 

<£,/,m,n < E(p X , 3 e(X) 2fe -'-W,3e(X) 2 ^-™), 

so Lemma [T] implies 

<i,m,n = 0(6 e (d) 2fc - i -™' 1 ) as n -> oo. (13) 
For I = 0, . . . , ki and m = 0, . . . , &2, we obtain 

l f j\-2(fc-l) V j / Ui V fci— i / Vfc 2 -ro/ 2 / 1zL \ 



6 e(d )-(»-*-m- i)^,,. 



for some constant C;. m > 0. Since I + m < fc, we have 

n i+ro £ <i(Z+m-l) = / e d(l-l/(J+m))s/+m > / ne d(l-l/k)\J+m_ ^ 
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Now, if ne^ 1 - 1 /^ -> c, < c < oo, it follows from ([T2 ]) -([T5 ]l that 

Var(Q k , n )^O ^ ed(1 1 _ 1/fc) ^ as n ^ oo. (16) 
In the same way, it can be shown that ([T6|l holds when k\ = 0. In particular, if 

ne d(l-l/k) _^ ^ we g gt 

Var((5k,n) -> as n -> oo. (17) 

Moreover, it follows from Lemma[T]that E(Qk.n) = b e (d)~^ k ^qu.,e <Zk, so we 
~ p 

obtain from (|17l) that Qk.n -> ?k as n -> oo. This completes the proof. □ 



Proof of Theorem^ Note that (i) and (ii) can be expressed together as follows: 
If ne d — > /3, < (3 < oo, and n 2 e d — > oo, then 

«e d/2 (g 2 ,„-g 2 , e ) ^N(Q, V + Pk). 

So we prove the theorem using the scaling ne d / 2 for both (?) and (ii). If 773 = 
713(11) is defined to be the greatest common divisor of n\ and n,2, then n\ = n^l 
and ri2 = 713777, where ^ and to are positive integers that satisfy 1/(1 + in) = p. 
Consider the following pooled random vectors in Jl d ( l + m ) 

Zi := (X^ i _ 1 ^ + i, . . . , Xu, ^(j-i^+i, ■ • • , Y mi ), i = 1, . . . , 77,3. 

The method of proof relies on the decomposition 

ne d ' 2 (Q 2 , n ~q 2 . t )^ne d ' 2 ^ b^d)- 1 (U a - E(U n )) + R a , (18) 

where £/ n , to be defined later, essentially is a one-sample [/-statistic with respect 
to the i.i.d. sample {Zi, . . . , Z n3 }. The idea is to prove that the remainder i? n 
tends to in probability and use the corresponding result from Jammalamadaka 
and Janson (1986) to show asymptotic normality for the first term in (fT5|) . 

For Zi := (x l{i ^ 1)+1 , xii,y m{i _ 1)+1 , y mi ),i = 1, . . . , n 3 , introduce the 
kernels 

1 1 

^(zuZj) -=^2^2l (^(^(i-i)+s' x Ki-i)+«) <e )' 

s=l t=l 
m m 

'PnHzhZj) '-=^2^2! (d {y m (i-i)+ s ,y m (j~i)+t) < e) , 

s=l i=l 
I m 

+s,y m (j-i)+t) < e) 

s=l t=l 
I m 

+ J2J2 I ( d ( x KJ-i)+s,y m (i-i)+t) < e) • 

s=l t=l 
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Furthermore, define 



fn. ( %i i % j ) 
/in 



E(/ n (^,Z 2 )) =b e {d)q 2 , e , 

EiMzuZj))-^ (19) 



! m 
s=l ' s=l 



rn(i— l)+s 



) - Mn- 



Let 



M n := ^ /(dpQ,^) < e), V n := ^ J(d(y <( ^) < e), 
W n :=5^/(d(JT < ,y i )<e) J 



and note that 

b e (d)Q 2 ,n = ao 



M n + a 2 



V n + a 1 (n 1 n 2 )- 1 W n . (20) 



Now consider the decompositions 

M n = m« + m„ 2 > , Vn = y„ (1) + k! 2 > , VK n = w a v + , 

where 



and define 



ao 
Z 2 



2Zm 



*<3 



With this definition of U n , it follows from (|2H)) that the decomposition (|18|) 
holds with remainder 

En = i^ + i^+^+^+^+^n^ 
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where 

i?i 3) := a 1 ^(n 1 n 2 )- 1 -(2M- 1 (" 2 3 ) 'W^, 

:= « 2 (; 2 )"- d/ ^w- i Ki 2 ), 

i4 6) := ai(n 1 n 2 )- 1 ne d / 2 6 e (d)- 1 W ] ( 2) . 

By the conventional [/-statistic theory for one-sample £/ -statistics (see, e.g 
1, Lee, 1990), we have 

Var(M«) = (2(n 3 - 2)fc, n + £ 2 , n ) , 

where 

a, n :=Cov(0«(Z 1 ,Z 2 ),0( 1 1 )(Z 1 ,Z 3 )), 6,n :=Var(0W(Zi,Z 2 )). 
It follows that 

&,n < E(<t>W(Z 1 ,Z 2 f) < l 2 E(<t>£\Z 1 ,Z 2 ))=l 4 P(d{X 1 ,X 2 ) < e), 
and hence £i >n ,&,n = 0(b e (d)). Therefore, we get from (|2"Tj) that 
Var(M«) = 0(ni6 e (d)) 

as n -> oo. 

Since 

AiiV ,_ 2 /n 3 \ _ /'"-A -1 ,-2f n i/ l \~ c 

UJ UJ = UJ I 2 J ~^asn^oo, 

it follows from ([22]) that 

Var^ 1 ') = O^" 1 ) -^Oasn^oo. 

Similarly, for i = 2,3, 

Var(i?W) 
Moreover, if a kernel is defined as 

Qn(Zi) ■= ^2 I (d(x l(l _ 1)+3 ,x l(l _ 1)+k ) < e) , j = l,...,n 3 , 

i<j<fe<; 
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then 

"3 

i=l 

and by Lemma [TJ 

Var(A4 2 )) = n 3 Var(e n (Z 1 )) ~ n 3 f ^ &e(rf)«2,o = 0(n 3 e d ) asn^co. 



This yields 

Var(i4 4) ) = 0(n _1 ) -^Oasn^oo. 
In a similar way, for i=5, 6, 

Var(i?W) -> as n -> oo. 

Since E(i? n ) = 0, it follows from (|2"g ]) -pj ]l that 

p 

R a — y as n — y oo . 
Next we prove asymptotic normality for t/ n . Let 

o 

°l ■= ^Var(/ n (Z 1 ,Z 2 )) + n|Var( 5n (^))- 

Using Lemma [1] it is straightforward to show that, as n —¥ oo, 

, n 2 6 e (d) / On Oo a? \ 

2 \ I 4 m z 2lm / 



(d) 2 Qvar (a px(X) + ^p y (X))+^Var (a 2 p Y (Y) + ^- Px (Y) 



Since n 2 e d — > oo implies n\b e (d) — > oo, we get from (j2l?)) that 

(T n — >• oo as n — > oo, 

and hence 

sup |/„(2i,Z2)| < |oq| + |«-2 1 + |oi| = o(cr n ) as n -> oo. 
Moreover, note that 

E(\U( Zl ,Z 2 )\) <Mj2p*A*i) + —f^PrAVi) 
I m 

»=i i=i 

i | / | | m 



2? ^r*>^~» 1 2m 

1=1 1=1 
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By Holder's inequality, 



1/2 / 1/2 
2, 



Px,e(a:) = / Px(y)dy < / / Px{y) dy\ 

J\y—x\<e. \J\y-x\<t ) \J \y—x\<e J 

= b t {d) 1 ' 2 (f px(yfdy) , 

\J\y-x\<£ J 

where the last integral tends to uniformly in x as e — > 0. The corresponding 
results can be shown for the other terms in (|3"Tj) . Hence, we obtain 

supE(|/ n (zi, Z 2 )\) = o{b e {df' 2 ) = o(<7 n /n 3 ). (32) 

Now, it follows from (I3U1) and (j3"2")l that the conditions of Theorem 2.1. in Jam- 
malamadaka and Janson (1986) hold. Consequently, 

Un ~*f n) = h (g /n( ^ } " fi)^) ° Wl)asn^oo. (33) 
Moreover, since ne d — > (3, < j3 < oo, it follows from ([2"§f and Lemma Q] that 

n 2 e d ^ 3 ^ b e (d)~ 2 al^ rj + /3k as n oo. (34) 
Finally, ([15]). ^7|, flM]), and the Slutsky theorem yield 

ne d/2 {Q 2 , n - ?2,e) 4 N(0, T) + /3k) as n ->• oo. 
This completes the proof. □ 



Proof of Theorem^ (i) Let := (Vi, . . . , Vd)' be an auxiliary random vec- 
tor uniformly distributed in the unit ball B\(Q). By definition, we have 91,1,6 = 
& e (d) _1 E(p Xi£ (y)) = Hpx{Y - eV)), and hence 

9i,i, e - <7i,i = / E (Px(y - eV))p Y {y)dy - Px{y)pv{y)dy 

JR d JR d 

= e (^J {px{y- tV) -px(y))pY(y)dy 

= E ^ (px{y- eV) - px(y))(jPr(y) - Pr(y - e\' ))d ij 
+ E (J Rd (P x (y ~ eV ^> ~ Px{y))pr{y - eV)dy 
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For the last term, by the change of variables z — y — eV and symmetry V = — V, 
we obtain 

E (yJ R J<P x (y ~ eV ^ ~ Px(y))PY(y - eV)dy 
= E (J d (Px(z) - px(z + eV)) PY (z)dz 
= E ^ (px(z) - Px(z - eV))p Y (z)dz ) = (</i.. L , </ M i 

We get 

2(914, 6 ~ = E I J (px{y - eV) -px(y))(PY(y) - Pr(y - eV))<hi 
and, by Holder's inequality and the density smoothness condition, 

< l^(^l Ri (Px(y-eV)- Px (y)) 2 dy X ' 

■ 1 1 

< ^K 2 E(\V\ 2a )e 2a < -K 2 e 2a . 

Similar inequalities can be obtained for Q2,o,e and ?o,2,e- Now it follows directly 
that, for some C > 0, 

\h,e-92\ <Ce 2a . 

This proves the assertion. 

(ii) Note that the conditions e ~ C rT 1 l { < 2a+d l T > and < a < d/A imply 

ne d = c d n^^m -> f3, < (3 < 0, as n -> 00. (35) 
From Jammalamadaka and Janson (1986), we get 

Var(Q 2 ,o,n) ~ 4n7 1 (g 3 ,o - g| >0 ) + 2nr 2 6 e (d) _1 q 2 ,o as n -> 00, (36) 
and the corresponding result for Var(Qo,2,n)- Furthermore, (fl~2|) yields 
& (d)~ 2 

Var(Q M , n ) = ((m - 1)^1,1,0,1,= + ("2 - 1)^1,1,1,0,= + ^,1,1,1 J > ( 37 ) 

where, by Lemma [TJ 

be(d)- 2 al lfiiltn = b £ (d)- 2 V^(p x .AYi)) -► q 2 ,i - <7i,i, (38) 

&e(<Jr 2 0i,l,l,O,n = 6e(d)" 2 Var(py, £ (Xi)) -> 9 i, 2 - g 2 : , 

6 e (d) _1 Oi,i,i,i,n = &e(d)" 1 Var(/(d(X 1 ,y 1 )<e))^ (?1 , 1 asn^^. 
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It follows from p5 ]) -([5g |) that 

Var(Q 2 ,n) < 3 (a^Var(Q 2 ,o,n) + a 2 Var(Q , 2 , n ) + o?Var(Qi,i, n ) 

= O (j^^j = O (n~2^m^ as n -> oo. (39) 

We get from (i) that (g 2 , e - qi) 2 < Cn~ 2=^/2 ; (7 > 0, so ([39]) implies 

Var(Q 2 ,n) + fee - 92) 2 - O (n^) asn^oo. 
Hence, for some C 2 > 0, any A > 0, and large enough ni,n 2 , we obtain 



and the assertion follows, 
fmj We have 

Vn(Q2, n - 92) = Vn{Q 2 , n - <Z2,e) + \/™(?2,e - g 2 )- (40) 
Now, when ne d — > /3, < /3 < 00 and a > <i/4, then (i) imply 

Iv^fee - 92)| < Cn 1/2 e 2a = C{ne d ) 1/2 e 2a ~ d/2 -> as n -> 00, 
so the assertion follows from Theorem [2](%) and the Slutsky theorem. 

(iv) From (i) and the assumption e ~ L(n)n~ 2 / d , we get 

K d/2 fe e - q 2 )\ < CL{n) d ' 2+2a n - ia ' d -> as n -> 00. 

Therefore, similarly as above, the assertion follows by using the decomposition 
corresponding to (|4"0")l . Theorem ^j(ii), and the Slutsky theorem. This completes 
the proof. □ 



Proof of Proposition^ (i) When n 2 e d — > 00 and ne d — ► /3,0 < /3 < 00, Theo- 
rcm[2]can be applied with Q2 : n = £ ) 2,n- Under Ho, we have 92, e = E(Z) 2jn ) = 
and k = 0, so Theorem [5] yields 



11c 



d/2 D2,n 4 AT(0, 77) as n ->■ 00 (41) 



in this case. Hence, we need to show that, for Z? 2n under H , the proof of 
Theorem [5] can be modified so that the assumption ne d — > (3,0 < (3 < 00 
is unnecessary. In fact, this assumption is only needed for the convergence 
property (f34|) of cr„. Under H , we get from the definition (fT9|) that g n (z) = 
and hence Var(<7 n (2i)) = 0. Therefore, the definition of <r 2 implies 



u|6 £ (d) f ag a| a\ 

7T<?2,o H J 1 ? - 2 + oT~ ■ 

l z m z lira 



°n ~ o I H 2 <?o,2 + ^7z:9i,i I as n -> 00, 
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and hence (j3"4"]l can be written 

n 2 e d (^2^j be ( d ) 2(7 l ~^ V as n ~^ 00 > 

which does not require convergence of ne d . Thus, the assertion follows from (|4ip 
and the Slutsky theorem. 

(ii) Under Hi, we get from TheoremQ]and the Slutsky theorem that -D2.il/ v^n — * 
D 2 /y/rj. Hence, 

P(T n > u n ) = l-P{D 2 /^-u n /ne d/2 < Lh/y/rj-£>2,n/y/rfr) -> 1 as n -> <»• 
This completes the proof. □ 

Proof of Proposition [5[ The assertion follows straightforwardly from Theorem 
[3] and in a similar way as in Leonenko and Seleznjev (2010). □ 
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